Skip to content

Batch ContentSummaryLog queries in _consolidate_courses_data#14444

Merged
rtibbles merged 1 commit into
learningequality:developfrom
devswithme:optimize/batch-course-progress-summary-queries
Jun 17, 2026
Merged

Batch ContentSummaryLog queries in _consolidate_courses_data#14444
rtibbles merged 1 commit into
learningequality:developfrom
devswithme:optimize/batch-course-progress-summary-queries

Conversation

@devswithme

@devswithme devswithme commented Mar 25, 2026

Copy link
Copy Markdown
Contributor

Summary

Reduces database queries in the learn plugin at _consolidate_courses_data from 2N to N+1, where N is the number of courses a learner is enrolled in. Previously, for each course node the function issued two separate queries: one COUNT for non-topic descendants and one COUNT on ContentSummaryLog for completed content. On a learner's home page or course list with 10 enrolled courses, this produced 20 queries.

References

Closes #14453.

Reviewer guidance

  • The only changed function is _consolidate_courses_data in
    kolibri/plugins/learn/viewsets.py.
  • Run pytest kolibri/plugins/learn/test/test_learner_course.py and
    pytest kolibri/plugins/learn/test/test_learner_classroom.py to
    confirm all existing tests pass.
  • Completed content is counted as the distinct set of matching
    ContentSummaryLog content_ids (set(descendant_ids) & completed),
    preserving the original filter(content_id__in=...).count() semantics
    when a course contains the same content_id in more than one node.

AI usage

I used Claude to identify the N+1 query pattern and propose the batching approach and reviewed the generated code for correctness specifically verifying UUID type consistency between ContentNode and ContentSummaryLog, and the empty-set guard, and confirmed the change produces identical output across all existing test cases before accepting it.

@rtibbles

Copy link
Copy Markdown
Member

Hi @devswithme - could you open an issue for this so we can discuss further before jumping to a solution? I agree with the assessment that there is some potential for optimization here, but I'd rather plan it out a bit more carefully.

@devswithme

Copy link
Copy Markdown
Contributor Author

Yes can, Thank you @rtibbles.

@learning-equality-bot

Copy link
Copy Markdown

📢✨ Before we assign a reviewer, we'll turn on @rtibblesbot to pre-review. Its comments are generated by an LLM, and should be evaluated accordingly.

@rtibblesbot rtibblesbot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean optimization with correct semantics. CI passing. No UI files changed.

  • suggestion: The cross-course batching has no multi-course progress test — see inline.
  • praise: Empty-set guard and deduplication — see inline.

@rtibblesbot's comments are generated by an LLM, and should be evaluated accordingly

How was this generated?

Reviewed the pull request diff checking for:

  • Correctness: bugs, edge cases, undocumented behavior, resource leaks, hardcoded values
  • Design: unnecessary complexity, naming, readability, comment accuracy, redundant state
  • Architecture: duplicated concerns, minimal interfaces, composition over inheritance
  • Testing: behavior-based assertions, mocks only at hard boundaries, accurate coverage
  • Completeness: missing dependencies, unupdated usages, i18n, accessibility, security
  • Principles: DRY (same reason to change), SRP, Rule of Three (no premature abstraction)
  • Checked CI status and linked issue acceptance criteria
  • For UI changes: inspected screenshots for layout, visual completeness, and consistency

Comment thread kolibri/plugins/learn/viewsets.py Outdated
)

total_content = content_qs.count()
all_content_ids = {cid for ids in course_content_map.values() for cid in ids}

@rtibblesbot rtibblesbot Apr 22, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved — addressed in the current code.

suggestion: all_content_ids merges content across all N courses, but the test suite only has a single-course progress test (test_learner_course_progress_calculated_correctly). The multi-course batching path — where two courses share the same completed_ids set and each gets its own per-course intersection — has no direct test. Consider a test with 2 courses having different completion states to confirm the batched lookup produces the same per-course progress values as the old per-course queries did.

Comment thread kolibri/plugins/learn/viewsets.py Outdated
total_content = content_qs.count()
all_content_ids = {cid for ids in course_content_map.values() for cid in ids}
completed_ids = set()
if all_content_ids:

@rtibblesbot rtibblesbot Apr 22, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved — addressed in the current code.

praise: Good guard — skips the ContentSummaryLog query entirely when no courses have any content. Also worth noting: using set() on the query result means the new code correctly handles the edge case where ContentSummaryLog has multiple rows for the same (user, content_id) pair (the model has no unique constraint). The old .count() approach would have overcounted in that scenario; the new approach counts each logical completion exactly once.

@akolson akolson requested a review from rtibbles May 4, 2026 19:08
…_data

The learn home/course consolidation issued two queries per enrolled
course: a descendant COUNT and a ContentSummaryLog COUNT for completed
content (2N total). Gather each course's descendant content ids in a
single pass and resolve completed content across all courses in one
ContentSummaryLog query (N+1).

Output is unchanged: total stays the descendant row count and completed
stays the distinct count of matching summary logs (set intersection),
matching the original filter(content_id__in=...).count() semantics.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@rtibbles rtibbles force-pushed the optimize/batch-course-progress-summary-queries branch from 2afebc7 to b0532aa Compare June 17, 2026 02:31
@github-actions github-actions Bot added DEV: backend Python, databases, networking, filesystem... APP: Learn Re: Learn App (content, quizzes, lessons, etc.) SIZE: small labels Jun 17, 2026

@rtibblesbot rtibblesbot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 prior findings open; all acknowledged — maintainer approved with the IN-clause SQLite variable-count tradeoff noted.

Prior-finding status

ACKNOWLEDGED — kolibri/plugins/learn/viewsets.py:162 — Correct handling of duplicate content_ids [praise]
ACKNOWLEDGED — kolibri/plugins/learn/viewsets.py:152 — IN clause grows with total enrolled content, not per-course [suggestion]


@rtibblesbot's comments are generated by an LLM, and should be evaluated accordingly

How was this generated?

Ran an automatic code-only delta review triggered by new commits on a previously reviewed PR:

  • Retrieved prior bot reviews via the GitHub API
  • Classified each prior finding as RESOLVED, UNADDRESSED, ACKNOWLEDGED, or CONTESTED
  • Only raised NEW findings for newly introduced code
  • Core review pass only — specialized frontend/backend lenses and manual QA run when a review is explicitly requested
  • Synthesized one review from the passes and chose the verdict from the findings, CI status, and QA evidence

@rtibbles rtibbles merged commit 110e4a4 into learningequality:develop Jun 17, 2026
70 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

APP: Learn Re: Learn App (content, quizzes, lessons, etc.) DEV: backend Python, databases, networking, filesystem... SIZE: small

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Enhancement]: Batch ContentSummaryLog queries in _consolidate_courses_data

3 participants